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Abstract 

Background: Identification of ligand-protein binding interactions is a critical step in drug discovery. Experinnental 
screening of large chemical libraries, in spite of their specific role and importance in drug discovery, suffer from the 
disadvantages of being random, time-consuming and expensive. To accelerate the process, traditional structure- or 
ligand-based VLS approaches are combined with experimental high-throughput screening, HTS. Often a single 
protein or, at most, a protein family is considered. Large scale VLS benchmarking across diverse protein families is 
rarely done, and the reported success rate is very low. Here, we demonstrate the experimental HTS validation of a 
novel VLS approach, FINDSITE^°^^ across a diverse set of medically-relevant proteins. 

Results: For eight different proteins belonging to different fold-classes and from diverse organisms, the top 1% of 
FINDSITE^°^^'s VLS predictions were tested, and depending on the protein target, 4%-47% of the predicted ligands 
were shown to bind with ijM or better affinities. In total, 47 small molecule binders were identified. Low nanomolar 
(nM) binders for dihydrofolate reductase and protein tyrosine phosphatases (PTPs) and micromolar binders for the 
other proteins were identified. Six novel molecules had cytotoxic activity (<10 |jg/ml) against the HCT-1 16 colon 
carcinoma cell line and one novel molecule had potent antibacterial activity. 

Conclusions: We show that FINDSITE^°^^ is a promising new VLS approach that can assist drug discovery. 

Keywords: Drug discovery. Virtual ligand screening (VLS), High-throughput screening (HTS), Differential scanning 
fluorimetry (DSF), Ligand homology modeling 



Background 

Traditional experimental approaches to drug discovery 
rely on two different strategies [1]. The first selects a re- 
liable therapeutic target that might be essential for an 
organism s or cells survival, and then, using chemical li- 
brary screening, potential leads that bind to and modu- 
late the activity of the target in vitro and subsequently, 
in vivo, are identified. The second approach tests small 
molecules on animal disease models or cell cultures 
(called phenotypic screening), and once activity is 
gleaned, the protein target is experimentally identified 
by target deconvolution [2]. Both approaches have con- 
tributed to the discovery of new drugs despite suffering 
from substantial disadvantages of high cost and time. 
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Fragment-based drug discovery approaches have recently 
gained prominence as a distinct and complementary ap- 
proach to drug discovery [3]. Integration of a robust 
VLS methodology with experimental HTS approaches 
constitutes one of the many methods that might acceler- 
ate the drug discovery process [4] . 

Despite its current limitations, VLS may be employed 
as a possible first step in drug discovery [5]. It not only 
aids in the selection of an appropriate protein target but 
also narrows down the chemical space that is experi- 
mentally screened to arrive at significant protein-ligand 
interactions. In practice, both ligand- and structure- 
based VLS approaches [6] have been used. The principal 
disadvantage of a ligand-based approach is the need for 
a priori knowledge of a set of ligands known to bind to 
the target [7]. Structure-based approaches require a 
high-resolution structure of the target; this situation 
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typically only holds for a minority of proteins in a given 
proteome [8]. To overcome these limitations, ligand 
homology modeling (LHM) was developed to predict li- 
gands that bind to the protein target [9-11]. LHM relies 
on the fact that evolutionarily distant proteins share 
functional overlap and their ligand-binding information 
provides diverse bound ligands that can be employed in 
a general VLS approach. Thus, it does not suffer from 
the limitations of quantitative structure- activity relation- 
ship (QSAR) -based approaches. In large scale bench- 
marking, the FINDSITE'^'^"'^ LHM approach exhibited 
significant performance advantages over traditional ap- 
proaches in terms of enrichment factor, speed, and in- 
sensitivity as to whether experimental or predicted 
protein structures are used [12]. However, experimental 
assessment of the method, where blind predictions are 
made and then experimentally tested, has not been done. 
To ensure robustness, a diverse set of proteins and li- 
gands must be examined, and the strengths and limita- 
tions of the approach demonstrated. 

A reliable and fast method that would test VLS predic- 
tions and identify hits could help accelerate the drug- 
discovery process. This could help alleviate the inherent 
complexity of treating diseases due to cross-reactivity and 
could address the rapid evolution of resistance to available 
drugs by pathogenic microbes. We have resorted to the 
thermal shift assay methodology to assess the predictions 
from VLS [13]. The methodology is an inexpensive way to 
assess the binding of small-molecules by the stability they 
confer on thermal denaturation of the protein target of 
interest. Upon thermal denaturation, the hydrophobicity 
of proteins increases, leading to an increase in fluores- 
cence of an extrinsic fluorophore reporter dye. This 
method is amenable to miniaturization and can screen 
hundreds of molecules simultaneously for their ability to 
bind to the protein target of interest. 

Recognizing the importance of these issues, in the 
present paper, to assess if FINDSITE^^"^^ [12] can improve 
VLS, we selected an assortment of medically-relevant pro- 
teins with differing fold- architectures from diverse organ- 
isms including the causative agents of human and primate 
malaria, Plasmodium falciparum and Plasmodium know- 
lesi, an opportunistic pathogen Escherichia coli, and pro- 
teins implicated in mammalian disorders (from Homo 
sapiens and Rattus norvegicus). For these proteins, top 
ranked ligands predicted by FINDSITE*^^"^^ are experi- 
mentally assessed for binding by thermal-melt assays. 
After validating the small molecule binding predictions, 
we tested their physiological function by their ability to kill 
bacteria such as multi-drug resistant E. coli (MDREC), 
methicillin-resistant Staphylococcus aureus (MRSA), 
Vancomycin-resistant Enterococcus faecium (VREF), and 
their cytotoxic activity using HCT-116 colon carcinoma 
tumor cell line. The encouraging experimental results 



for both binding and physiological activity show that 
FINDSITE'^^"'^ is an effective VLS tool. 

Results 

The section summarizes the results from FINDSITE*^°"^^s 
VLS predictions on eight different proteins and their val- 
idation by the thermal shift assay methodology. 

Prior to assessing the VLS results on the eight protein 
test set, the thermal shift methodology was validated on 
three proteins having known binding and nonbinding li- 
gands. Only cognate protein-ligand pairs showed shifts 
in the transition mid-point of thermal melt curves, T^, 
while non-cognate ligands displayed no such shifts 
(Additional file 1: Figure SI and SI). 

We next applied the methodology, as shown in Figure 1, 
in benchmark mode to eight diverse proteins, viz., 
FINDSITE^^"^^ only considered template proteins whose 
sequence identities to the target was <30%. Typically on 
the order of 50 ligands per protein gave interpretable ther- 
mal shift curves. Of these, the experiments identified a 
total of 47 small-molecule/protein binding interactions 
with (iM or better affinities. Ten ligands with apparent 
nM binding affinities (less than 1 (iM) were identified 
for dihydrofolate reductase from E, coli and the two 
mammalian protein tyrosine phosphatases (PTPs). 
Except for a small fraction of known inhibitors, which 
further validated the methodology, most are novel. The 
prediction percentage success rate ranged from 3.9% 
of ligands tested for the P, falciparum ubiquitin- 
conjugating enzyme to almost 47% for dihydrofolate 
reductase from E, coli (Table 1). This is a major advance- 
ment over previously reported success rates [14]. The 
small-molecules that displayed biological activity had low 
(iM or nM affinities in the in vitro thermal shift assay 
(Table 2; Additional file 1: Tables S3-S5). This supports 
the conjecture that their in vivo biological activity might 
result from binding of the small-molecule with the 
intended target protein. A more detailed summary of the 
results is presented below. 

£ coli dihydrofolate reductase (DHFR) 

In silico screening of E, coli DHFR was carried out with 
FINDSITE'^^"'^ in benchmarking mode (Additional file 1: 
Table S2A). The top 1% of predictions, with 83 small- 
molecules, was assessed for binding (Table 1). Fifteen 
ligands, representing 47% of interpretable curves, showed 
binding (Figure 1 and Table 1). Of these 15 hits, represent- 
ing (iM or better binders, six were previously reported 
inhibitors of DHFRs from various organisms [15-19]. 
Among these known binding molecules, methotrexate 
(NSC740) showed the maximum thermal shift of ~30°C 
followed by 7H-Pyrrolo(3,2-f) quinazoline-1, 3-diamine 
(NSC339578) [15], methylbezoprim (NSC382035) [16], 
pralatrexate (NSC754230) [17], pemetrexed (NSC698037) 
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SEQUENCE 



M LA HPPIPILELADHIERLKANDNLKF QE E"l 
DPGQQF WEH NLEVNKPKNR ANVIA DH RV 
LL AIEGIPG D VNAN IDG RKQNA lA QG L 
PE FGDFWRMIWEQR A WMM KLEER RVKC 
DQ WP RG E HGLVQV LLD VELA CVR FA 
L KNG EKREVRQFQF AWPDHGVPEHP PFL 
AFLRRVK CNPPDAGPMWHC AGVGR GCFIVI 
DAMLERIKHEK VDI GHV LMRAQRN MVQ ED 
Q IFIHDALLEAV CGN EEGHHHHHH 



TEMPLATES 



PDB LIGANDS DATABASE LIGANDS (NCI AND ZINCS) 



THREADING 





LIGAND SIMILARITY CALCULATIONS 



TOP 1% RANKED LIGANDS 

THERMAL SHIFT ASSAY 



PDB LIGANDS 


1 


1 


0 


1 


1 


0 


1 


0 


0 


1 


DATABASE LIGANDS 


1 


1 


0 


1 


0 


0 


0 


0 


1 


1 



DHFR 



1000001 



1000006 



TrpRS 




UCE 



NAP1 



TP2 



Temperature (°C) 



cDPK 





Temperature (°C) 



ANTIBACTERIAL AND ANTICANCER ACTIVITY 

Figure 1 Flowchart of the overall approach and the thermal shift assay results. The first panel shows the in silico approach to predicting 
protein-small molecule interactions. All predictions were in benchmarking mode with a 30% template SID cutoff and the top 1% of the hits tested 
using thermal-shift assays. The second panel shows a representative fraction of the thermal melt curves that showed positive shifts for the tested 
proteins. The numbers are the NSC notation that identifies each small-molecule. DHFR is E. coll dihydrofolate reductase, 1000001 is a PTP from 
R. norvegicus, 1000006 is a PTP from H. sapiens, TrpRS is tryptophanyl tRNA synthetase from H. sapiens, UCE is ubiquitin-conjugating enzyme from 
P. faiciparum, NAPl is nucleosome assembly protein 1 from P. l<nowlesi, TP2 is thioredoxin peroxidase 2 from P. faiciparum and cDPK is the 
wild-type cAMP-dependent protein kinase, catalytic subunit from H. sapiens. Small-molecule binders were tested for their antimicrobial & cytotoxic 
activity against HCT-1 16 colon carcinoma cell line. 



[18] and 6,7-bis(4-aminophenyl) pteridine-2,4-diamine complex matches reported literature values, M^hich range 
(NSC61642) [19]. The approximate dissociation constant from 2 to 50 nM [20-22], within experimental error. Thus, 
(/<d) of 62 nM for the enzyme-methotrexate (NSC740) the thermal shift methodology provides an approximate 
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Table 1 Results from the thermal shift assays on eight proteins, ranked by best ligand binding^ 

Protein Organism No. of ligands No. of good No. of + ve^ Best hit (NSC) AT^ TO Best hit structure 

tested curves shifts/% + 

ve^ shifts 



DHFR E. coll 83 



1000006 H. sapiens 59 



1000001 R. norvegicus 



32 



43 



42 



TrpRS H. sapiens 94 



UCE 



P. falciparunn 80 



TP2 P. falciparunn 67 



cDPK H. sapiens 



NAP 1 P knowlesi 82 



12 



51 



12 



19 



54 



15/46.9 309401 



6/13.9 133351 



10/23.8 134137 



30.74 



16.76 



12.30 



8.21 



168.29 



406.0 



5/41.7 



2/03.9 



750690 



93427 



14.57 1277.51 



2/16.7 106231 



14.86 



5.7 



1376.09 



40872.77 



3/15.8 27032 



4/074 36398 



2.95 



2.21 



48538.90 



180135.58 





1000001: Carboxy-terminus phosphatase domain of protein tyrosine phosphatase (2NV5), DHFR: Dihydrofolate reductase, UCE: Ubiquitin conjugating enzyme, 
TrpRS: Tryptophanyl tRNA synthetase, TP2: Thioredoxin peroxidase 2, 1000006: catalytic domain of protein tyrosine phosphatase (2G59), cDPK: Catalytic subunit 
of cAMP-dependent protein kinase, NAP1: Nucleosome assembly protein 1. ^Positive thermal shift is indicated by the notation + ve. Kq indicates dissociation 
constants. "^The dissociation constant reported in this table are computed from the thermal shifts obtained. ^The values reported in this table are experimental 
in-vitro values. 



/<D- The five other known inhibitors bind DHFR with low 
\iM or nM /<dS (Additional file 1: Table S3), 

Nine small molecules are novel hits with no reported 
binding to/activity against DHFRs. These molecules are 
chemically diverse. The 15 different hits cluster into 10 
distinct chemical classes based on a Tanimoto coefficient 
(TC) cutoff of 07 (Additional file 1: Figure S2A). 
NSC309401, the top novel hit in Table 1, showed appar- 
ently better binding to E. coli DHFR than methotrexate 
(Kd of 48 nM and a thermal shift of almost 31 degrees) 
and showed inhibition against several antibiotic-resistant 
microbial strains (Table 2). It displayed a promising MIC 
of 7.8 (ig/mL against E, coli DH5a and a reasonable 



MIC (31.25 (ig/mL) against MRSA and VREF. It also has 
very potent activity against the HCT-116 colon carcin- 
oma cell line with an IC-50 of 0.13 (ig/mL (Table 2). 

This corroborates findings from the NCI human tumor 
cell line growth inhibition assay showing that this 
molecule has activity (potency not revealed) against sev- 
eral cancer cell lines including melanoma, prostrate, 
colon, and breast (http://pubchem.ncbi.nlm.nih.gov, CID: 
24198955, substance SID: 573494, compound name: 
MLS002701801) [23]. We posit that its activity is at least 
partly due to DHFR inhibition. Since NSC309401 inhibits 
both prokaryotic and eukaryotic systems, it might be a 
broad specificity antifolate. 2, 4-diaminoquinazolines and 
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Table 2 Antimicrobial and anticancer activities of a representative set of small-molecules^ 



Protein^ 


Identity (NSC) 


DH5a (MIC) 


MDREC (MIC) 


MRSA (MIC) 


VREF (MIC) 


HCT-116 (IC-50) 


DHFR 


309401 


7.813 


125 


31.25 


31.25 


0.130 




740* 


ND 


ND 


ND 


500 


0.048 




339578* 


62.5 


250 


31.25 


31.25 


6.11 




382035* 


ND 


ND 


31.25 


31.25 


0.182 




754230* 


ND 


ND 


ND 


ND 


«0.031 


1000001 


111552 


NA 


NA 


NA 


NA 


2.2 




Z^D 1 D 1 


MA 

INI A 


MA 
In A 


MA 
In A 


MA 
In A 






30205 


NA 


NA 


NA 


NA 


0.146 




88882 


NA 


NA 


N A 


NA 


4.44 




106863 


NA 


NA 


NA 


NA 


14.5 


1000006 


92794 


NA 


NA 


NA 


NA 


9.78 


TrpRS 


750690^ 


NA 


NA 


NA 


NA 


1.11 




88882 


NA 


NA 


NA 


NA 


4.44 




37168 


NA 


NA 


NA 


NA 


1.34 



^Reported inhibitors of DHFR independently picked up by our predictions and validated experimentally. *^Small molecule with known anti-cancer properties 
(valrubicin). "^Small molecules with known anticancer properties (Sunitinib), MIC: Minimum inhibitory concentration required for 90% clearance, |jg/mL units. ND: 
No significant inhibition. NA: not applicable. DH5a: E. coli strain DH5a, MRSA: Methicillin-resistant S. aureus, MDREC: Multi-drug resistant E. coli, VREF: Vancomycin-resistant 
E. faecium, HCT-1 16: Colon carcinoma cell line. IC-50: inhibitory concentration for 50% growth inhibition, |ig/mL units. ^For additional details, see legend from Table 1. h"he 
values reported in this table are experimental in-vitro values. 



their derivatives are known to inhibit DHFR (a prominent 
example is trimetrexate) (Rosowsky, et al, 1995) but 
their structures are different from NSC309401, a 7-[(4- 
aminophenyl) methyl] -7Hpyrrolo [3, 2-f] quinazoline-1, 
3-diamine, in that the latter compound has a novel tricyc- 
lic heterocycle. 

Another interesting small molecule, with no previously 
reported binding to DHFR, was NSC80735, with a /<d of 
1.7 [iM and a MIC of 10.9 (ig/mL against HCT-116 
(Additional file 1: Table S3). The other novel hits had af- 
finities ranging from 6-75 [iM; these hits represent po- 
tential compounds that could be improved to increase 
their medical significance vis-a-vis DHFR inhibition. A 
single novel hit had a poor affinity of -460 (iM. 

Protein tyrosine phosphatases (PTP) 

The top 1% of VLS predictions (Additional file 1: Table 
S2B and S2C), representing 86 and 59 molecules, were 
tested on PTPs 1000001 and 1000006, respectively. Ten 
molecules, 24% of the interpretable curves, showed posi- 
tive shifts for PTP 1000001, and six molecules, 14% 
of the interpretable curves, showed positive shifts for 
PTP 1000006 (see Figure 1 and Table 1). However, it 
should be noted here that a few of the reported mole- 
cules have low Q values representing poor signal com- 
pared to the thermal unfolding curve of the protein 
alone (see Materials and Methods) (Additional file 1: 
Table S4). All these compounds are novel hits, with no 
reported binding to/activity against PTPs. At a TC cutoff 
of 0.7, the 10 ligands showing experimental binding 



to 1000001 clustered into eight different subgroups 
(Additional file 1: Figure S2B), while the six ligands 
showing experimental binding to 1000006 clustered into 
four different subgroups (Additional file 1: Figure S2C). 
This again demonstrates the diversity of ligands selected 
by FINDSITE^"^""^. Next, 32 predictions ranked below 
the top 1% from VLS were randomly selected and tested 
experimentally on 1000001 and 1000006 to demonstrate 
that the obtained hit rate for the top 1% was appreciably 
better than the background. Convincingly, as inferred by 
the lack of shift in T^, none showed any binding. 

Among the ten hits for 1000001, seven had [xM 
affinities, three had nM affinities with the compound 
NSC 134 137 showing a maximal thermal shift of ~12°C. 
This translates into an approximate /<d of 406 nM 
(Additional file 1: Table S4). Five of these compounds, 
50% of the hits, displayed cytotoxic activity against 
HCT116. Valrubicin (NSC246131), (a known anticancer 
agent that intercalates with DNA [24]), was also shown 
to bind to PTP 1000001 with an approximate dissociation 
constant of 1.5 [iM, NSC246131 binding to PTP 100001 
hints at promiscuity of this molecule. Three hits, 
NSCl 11552, NSC30205 and NSC88882 also showed po- 
tent cytotoxic activity (IC-50 of 2.20 (ig/mL, 0.15 (ig/mL 
and 4.44 (ig/mL, respectively), while NSC106863 showed 
reasonable cytotoxic activity with an IC-50 of 14.5 [ig/mL 
against the HCT-116 colon carcinoma cell line (see Table 2; 
Additional file 1: Table S4). We note that a single paper 
reports the cytotoxic activity of NSCl 11 552 derivatives 
against cancer cell lines [25]. While there is no literature 
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describing the anticancer properties of either NSC30205 or 
NSC88882, 9-aminoacridine-based compounds are known 
to be cytotoxic towards cancer cell lines [26-33]. Thus, the 
mode of action of NSC30205 could be similar [31]. We also 
posit that the PTP human homologue is one of the targets 
responsible for the cytotoxic activities of these molecules. 

All six hits for 1000006 have apparent /<dS that range 
from 168 nM-271.5 [iM (Additional file 1: Table S4). 
The top hit was NSC133351 with an approximate dis- 
sociation constant of 168.3 nM. NSC92794, with a of 
161.9 (iM, displayed reasonable cytotoxic activity with 
an IC-50 value of 9.8 [ig/mL against HCT-116 colon car- 
cinoma cell line. None of the other hits of 1000006 dis- 
played discernible cytotoxic activity. Since 1000001 and 
1000006 are both PTPs and share substantial structural 
similarity, there were instances where 1000001 binders 
also bind 1000006 (Additional file 1: Table S6 and SI). 

Ubiquitin-modifying enzyme (UCE) 

For P. falciparum UCE, 80 molecules from the top 1% of 
FINDSITE^^"'^ predictions (Additional file 1: Table S2G), 
were experimentally tested for binding (Table 1); only 51 
gave interpretable thermal shift curves. Two molecules, 
4% of the interpretable curves, showed binding (see 
Figure 1 and Table 1). NSC93427 binds to UCE, with a 
thermal shift of ~15°C that translates into an approximate 
/<D of 1.4 (iM. Another compound, NSC50651, showed an 
apparent /<d of 197 (iM (Additional file 1: Table S5). 
Future studies to assess the inhibition of in vitro cultures 
of P, falciparum by these small-molecules are needed to es- 
tablish their utility as lead compounds for malaria treatment. 

Tryptophanyl tRNA synthetase (TrpRS) 

For TrpRS, 94 compounds from the top 1% of the VLS 
(Additional file 1: Table S2D) were experimentally 
screened (Table 1). Five, constituting 42% of the inter- 
pretable curves, showed thermal shifts (see Figure 1 and 
Table 1). The ligands clustered into three different sub- 
groups (Additional file 1: Figure S2D) based on a TC 
cutoff of 0.7. The most interesting small-molecule that 
binds TrpRS was Sunitinib (NSC 750690) with an ap- 
proximate /<D of 1.3 (iM and an IC-50 of 1.1 (ig/mL for 
HCT-116. The observed effect might be due to its inhib- 
ition of multiple targets (receptor tyrosine kinases are 
known Sunitinib targets [34]). 

Two other small molecules, NSC88882 and NSC37168, 
with^/CDS of 3.8 (iM and 9.1 (iM respectively, also 
showed potent inhibition of HCT-116, with IC-50s of 
4.44 (ig/mL and 1.34 (ig/mL, respectively (Table 2). 
NSC88882 has been shown to possess activity in the 
several bioassay trials undertaken by the NCI suggesting 
high promiscuity across several protein targets (http:// 
pubchem.ncbi.nlm.nih.gov/, substance SID: 26665273, 
CID: 68249) [31]. NSC37168 also binds multiple targets 



within different cell types [3,35]. However, none of 
these reports suggest binding/inhibition of TrpRS. Other 
compounds that bind TrpRS were NSC50690 and 
NSC55152, having /<dS of 7.7 (iM and 39.6 (iM, respect- 
ively (Additional file 1: Table S4). 

Thioredoxin peroxidase2 (TP2), cAMP-dependent protein 
kinase (cDPK) and nucleosome assembly protein 1(NAP1) 

TP2 from P, falciparum, the catalytic domain of the 
cDPK from H. sapiens and NAPl from P. knowlesi were 
tested with moderate success. Their thermal melt assay 
results are collated in Table 1 and Additional file 1: 
Table S5, with additional VLS results summarized in 
Additional file 1: Table S2F, S2E and S2H, respectively. 
Experimental thermal melt curves are shown in Figure 1. 
As can be seen in Additional file 1: Table S5, all these 
small-molecules bind with \iM affinities (ranging from 
41 (iM-371.5 (iM), making a few of them potential can- 
didates for further development. 

Discussion 

In this paper, we describe the large-scale experimental 
validation of the FINDSITE^^""^ VLS methodology and 
demonstrate that the approach is applicable to a wide 
variety of proteins. In contrast, previous instances of 
VLS coupled to experimental screening of ligands re- 
ported in the literature mostly concentrate on either 
a single enzyme or a single enzyme family [36-41]. 
FINDSITE'^''"'^, being a hybrid of structure-based and 
ligand-based VLS approaches, has many advantages: It 
identifies a structurally diverse set of ligands as potential 
hits, retains the speed of traditional ligand-based ap- 
proaches, and removes the requirement of traditional 
structure-based approaches that a high-resolution struc- 
ture of the protein target of interest be solved. Thus, 
-75% of a given proteome is accessible to this VLS meth- 
odology. This affords the possibility not only of identifying 
novel hits, but also for repurposing FDA approved drugs, 
and concomitantly suggesting possible drug side effects. 

Demonstration of the methodology on a diverse set of 
proteins with differing folds suggests that the method is 
a general and effective approach to discovering novel 
protein-ligand binding interactions. The primary success 
rates of 4%-47% are dramatic when compared to rates 
reported in the literature. Since only a tiny fraction of 
the protein/ligand binding predictions were assessed ex- 
perimentally (20-50 of the top ranked predictions from 
FINDSITE^^"^^), these success rates are even more sig- 
nificant than the raw numbers would suggest. For in- 
stance, in another study describing the HTS of a diverse 
library of 50,000 small-molecules against E. coli DHFR, 
the primary hit rate was 0.12% [14], whereas 47% of the 
32 molecules predicted by FINDSITE^^""^ bind with [iM 
affinities or better. Indeed, the finding that many ligands 
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have /<dS in the nM and \iM range is encouraging. For 
three different proteins, novel nM binders were identi- 
fied. Demonstration of antibacterial and cytotoxic activ- 
ity by some of these compounds further suggests that 
the present methodology is a promising approach to 
identify novel hits and could help enrich the drug dis- 
covery pipeline. However, we are aware that hits gener- 
ated through thermal-shift methodology relying on an 
extrinsic fluorophore will require additional validation. 

Not only has a methodological advance been demon- 
strated, but also the results hold possible medical signifi- 
cance. We have identified several interesting hits that 
might represent starting scaffolds for drug design for a 
number of clinically important protein targets. For ex- 
ample, DHFR, a pivotal enzyme in the nucleotide bio- 
synthetic pathway in E. coli [42] evolves resistance to 
available inhibitors by several mechanisms [43,44]. This 
is a major problem because drug-resistant E, coli causes 
the highest number of infections in hospitalized patients 
[35]. Thus, there is an urgent need to identify novel po- 
tent inhibitors of DHFR. In that regard, the current 
study provides nine novel structurally diverse small- 
molecule binders with apparent affinities ranging from 
nM to (iM that are interesting hits that could be devel- 
oped as lead molecules for E, coli DHFR inhibition. By 
assessing the potential of these ligands against a diverse 
set of drug-resistant microbial strains and colon cancer 
cells, we established the range of effectiveness of these 
compounds. A potent antibacterial and 7 molecules with 
cytotoxic effect against HCT-116 colon carcinoma cell 
line were found. This information can be exploited in 
designing species-specific inhibitors. Yet other examples 
are the pathogens P, falciparum, which causes malignant 
malaria in humans, and P, knowlesi, implicated in an 
emergent form of malaria that can infect humans [45]. 
Rapid evolution of resistance to known antimalarials is a 
major issue [46]. The present study yielded 8 hits to 
three different enzymes that carry out critical processes 
of ubiquitin-mediated post-translational modification 
(UCE) [47], oxidative protection of the parasite during 
its intraerythrocytic stages (TP2) [48] and histone trans- 
port & chromatin assembly (NAPl) [49], in the patho- 
gen. Finally, four distinct target proteins representing 
members of three families, tRNA synthetases [50], phos- 
phatases and kinases [51,52] implicated in diseases such 
as cancer, were examined with 24 novel protein-ligand 
binding interactions reported. Interestingly, these studies 
also identified unanticipated binding interactions of 
well-known drugs with alternative targets. Sunitinib, a 
well-documented inhibitor of receptor tyrosine kinases 
[34], binds to TrpRS with high- affinity. This reinforces 
the belief that drug molecules, at least partly, work by 
interfering with the function of multiple targets within 
the cellular milieu. It is well known that developing a 



new drug is a time consuming and expensive process 
that can take 12-15 years. Such off-target interactions 
could be exploited towards repurposing available drugs 
for alternative protein targets, thus reducing the cost 
and time duration of drug-discovery. 

Conclusions 

In conclusion, we have demonstrated that FINDSITE'^^"^^ 
is an automated, robust and rapid methodology that can 
identify novel protein-ligand binding interactions that are 
often in the nM range or better, and which, in combin- 
ation with appropriate mechanistic studies and biological 
activity assays can be a promising tool for lead identifica- 
tion/drug discovery. The presented results show that pre- 
dicted structures can be successfully used for virtual 
ligand screening, and by exploiting the ideas of LHM, di- 
verse novel small molecule binders can be identified even 
when the closest template is distantly related to the 
protein target of interest. Since medically relevant pro- 
teins often have a large number of evolutionarily related 
solved, holo protein structures that can serve as tem- 
plates, they are a particularly good class of targets for 
the present methodology. However, we note that the 
methodology also works when there are few solved holo 
templates structures in the PDB, e.g. for GPCRs [12]. 
Work is now in progress to extend and experimentally 
validate the approach on a broader class of proteins 
and small molecule ligands. 

Methods 

Details about reagents are provided in SI 

Figure 2 shows the flowchart of FINDSITE^''"'^ method- 
ology [12] in combination with experimental validation 
protocol. FINDSITE'^^"^^ is a composite approach con- 
sisting of the improved FINDSITE-based approach [9] 
FINDSITE^^^' and the extended FINDSITE-based ap- 
proach FINDSITE^ [53]. In what follows, we detail the 
two FINDSITE-based component approaches and their 
benchmarking and prediction results. 

FINDSITE^''* for ligand virtual screening using 
experimental bound structures 

The FINDSITE^^^^ flowchart is shown in Figure 3(A) and 
consists mainly of three steps: (A) Finding a sub-set of pro- 
tein template in the library of holo PDB structures (experi- 
mental structures with bound ligands) that are putatively 
evolutionarily related to the target using target sequence 
and threading approaches; (B) Filtering the sub-set of holo 
PDB structures using the target structure (experimental or 
modeled) and structure comparison methods; (C) Select- 
ing pockets and ligands from the filtered sub-set for bind- 
ing site and virtual screening predictions. 

FINDSITE^^^^ [12] employs a heuristic structure-pocket 
alignment procedure and a sequence dependent scoring 
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function to rank the holo templates in step (B) above. The 
alignment is evaluated using the sequence dependent score: 

SP-score= ^ BLOSUM62(a, b), (1) 

aligned residue a,b 

where BLOSUM62(a,b) is the BLOSUM62 substitution 
matrix [54]. Templates are ranked by their 5'P-scores and 
the ligands corresponding to the top 100 templates are se- 
lected as template ligands for ligand virtual screening. 

FINDSITE^ for ligand virtual screening using experimental 
binding data without bound structures 

FINDSITE^^^^s performance relies on the existence of a 
sufficient number of holo PDB structures homologous to 
the target. This is not true for most membrane proteins 
where even apo structures (structures without bound li- 
gands) are rare. Thus, for some of the most interesting 
drug targets, such as the G-Protein Coupled Receptors 
(GPCRs) and ion-channels, FINDSITE^^^^ has limited per- 
formance. The FINDSITE'^ approach [53] was developed 
to overcome the shortcomings of FINDSITE^^^* on these 
kinds of targets. The flowchart of FIND SI TE^ is shown in 
Figure 3(B). FINDSITE^ utilizes experimental binding 
data without ligand bound experimental structures. To 
use the benefits from structure comparison, structures of 
proteins in experimental ligand binding database are mod- 
eled. FINDSITE^ uses the fast version of the structure 
modeling approach TASSER^^^ [55] (TASSER^^^-lite 
[53]) to create a virtual library of protein-ligand structures 
analogous to the PDB holo structures but without experi- 
mentally solved protein-ligand complex structures. Since 
there is no reliable pocket information for the virtual holo 
structure, whole structure comparison of the target to the 
templates (in the virtual holo structures) using fr-TM- 
align [56] is used. To reduce false positives, especially for 
targets like GPCRs where almost all structures are similar 
(TM-score > 0.4), a sequence dependent score similar to 
the SP-score in Eq. (1) over the fr-TM- aligned residues is 
used instead of the TM-score. The ligands of the top 
ranked templates are used as template ligands for searching 
against compound library. To identify template-ligand 
pairs, the DrugBank drug-target relational database [57] 
and the ChEMBL bioactivity database [58] are used. 

FINDSITE^^"^'' for ligand virtual screening 

FINDSITE^^""^ is the combination of FINDSITE^^^' that 
uses holo PDB structures as templates and FINDSITE^ 
that utilizes two independent ligand binding databases. 
For a given target and compound library, if there is no tar- 
get structure input, TASSER^^^-lite [53] models the 
structure. Then, three independent virtual ligand screen- 
ing runs are conducted: (a) FINDSITE^^^^ using the holo 
PDB structure library; (b) FINDSITE^ using the DrugBank 



virtual holo structure library; and (c) FINDSITE^ using 
the ChEMBL virtual holo structure library. For each vir- 
tual screening library, the following score is used to meas- 
ure the likelihood of a compound to be a true compound 
of the target: 



mTC = w 



Y,TC{Li,Lu,) 

1=1 



+ il-w) max (TC{Li,Liih)), 



(2) 



where TC stands for the Tanimoto Coefficient [59], A/ig 
is the number of template ligands from the putative evo- 
lutionarily related proteins; L/ and Lm, stand for the tem- 
plate ligand and the ligand in the compound library, 
respectively; is a weight parameter. The first term is 
the average TC [11]. The second term is the maximal 
TC between a given compound and all the template li- 
gands. Here, we empirically choose = 0.1 to give more 
weight to the second term so that when the template li- 
gands are true ligands of the target, they will be favored. 
For a given compound, three independent virtual screen- 
ings give three mTC scores and the maximal score is 
used for the combined ranking. 

In this study, to experimentally validate FINDSITE^''"'^ 
under non-trivial conditions, i.e. there are no close hom- 
ologous templates to the target, we have excluded all 
templates having sequence identity > 30% to given target 
in the PDB holo structures, DrugBank targets and 
ChEMBL targets. 

Comparison of FINDSITE'^°"'^ to traditional docking-based 
methods 

We previously conducted a benchmarking test of 
FINDSITE'^^"'^ on the DUD set (A Directory of Useful 
Decoys set [60]) and compared our results to the state- 
of-the-art docking-based methods for ligand virtual 
screening. The DUD set is designed to help test docking 
algorithms by providing challenging decoys. It has a total 
of 2,950 active compounds and a total of 40 protein tar- 
gets. For each active, there are 36 decoys with similar 
physical properties (e.g. molecular weight, calculated 
LogP) but dissimilar topology. Two freely available trad- 
itional docking methods AUTODOCK Vina [61] (http:// 
vina.scripps.edu/) and DOCK 6 [62] (http://dock.compbio. 
ucsf.edu/DOCK_6/) were compared to FINDSITE^"^""^. 
AUTODOCK Vina was tested on the DUD set and shown 
to be a strong competitor against some commercially dis- 
tributed docking programs (http://docking.utmb.edu/ 
dudresults/). DOCK 6 is an update of the DOCK 4 pro- 
gram [62]. These two methods represent state-of-the-art 
traditional docking-based approaches that are computa- 
tionally expensive, but do not require a known set of 
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binders for a given target as opposing to traditional ligand 
similarity-based approaches. FINDSITE'^^"^^ also does not 
require a known set of binders for the target, but is an 
order of magnitude faster than docking methods. Most im- 
portantly, FINDSITE^''"'^ does not require a high- 
resolution experimental structure of the target. Thus, it is 
applicable for screening both large compound library and 
for genomic scale targets. 

The performance of a given approach for virtual 
screening is evaluated by the Enrichment Factor (EF) 
within the top x fraction (or 100x%) of the screened li- 
brary compounds defined as: 



EFr 



Number of true positives within top 100:^% 
Total number of true positives x x 



(3) 



DOCK 6 in Ref [63] are better than that in Table 3 is 
due to the use of flexible docking and expertise in input 
preparation in Ref. [63], whereas here we employed de- 
fault input and rigid docking. 

We next examined the effect of target structure quality 
on the performance of methods. In Table 4, we show the 
enrichment factors EFq.oi and EFq.i of different methods 
using experimental and modeled target structures for a 
subset of 30 targets from DUD set. The other 10 targets 
are not included because the modeled structures have 
extended long tails (not compact) and their dimensions 
are too large for docking methods. The results of 
FINDSITE'^''"'^ change very little when modeled struc- 
tures as compared to experimental structures are used. 
This is not the case for either DOCK6 or AUTODOCK 
whose performance significantly deteriorates. 



A true positive is defined as an experimentally known 
binding ligand/drug or one that has a TC = 1 to an ex- 
perimentally validated binding ligand/drug. For x = 0.01, 
EFq.oi ranges from 0 to 100 (100 means that all true pos- 
itives are within the top 1% of the compound library). 
Another evaluation quantity employed here is the AUAC 
(area under accumulative curve of the fraction of true 
positives versus the fraction of the screened library). 

The performance of the three approaches on the DUD 
set using experimental target structures is shown in 
Table 3. FINDSITE^^""^ shows about 3 times the EFq.oi 
of AUTODOCK Vina or DOCK 6 for the top 1% se- 
lected compounds, with an EFqqi of 13.4 versus 4.80 
and 3.72, respectively. FINDSITE^^""^ has significantly 
better overall performance in terms of its AUAC (0.774 
vs. 0.586 and 0.426). Although we do not have direct ac- 
cess to some of the commercially available approaches 
compared in Ref. [63], we note that FINDSITE^^""^ has a 
better AUAC than the best performing GLIDE (v4.5) 
[64,65] (mean AUAC = 0.72) and all other compared 
methods: DOCK 6 (mean AUAC = 0.55), FlexX [66] 
(mean AUAC = 0.61), ICM [67,68] (mean AUAC = 
0.63), PhDOCK [69,70] (mean AUAC = 0.59) and Sur- 
flex [71-73] (mean AUAC = 0.66) [63]. The results of 



Table 3 Performance of methods on the 40 protein DUD 
set using experimental structures 



Method 


Average 
EFq.oi 


Average 
EFo.05 


Average 
EFo.i 


Average 
AUAC 


FINDSITE"°^^ 


13.4 


6.56 


4.37 


0.774 


AUTODOCK Vina 


4.80 


3.01 


2.40 


0.586 




(5.3x10"^)' 


(9.4x10"^) 


(7.7x10"^) 


(3.0x10"^) 


DOCK 6 


3.72 


1.79 


1.24 


0.426 




(1.5x10"^) 


(1.8x10"^) 


(9.9x10"^) 


(1.3x10"^^) 



Large scale testing of FINDSITE''°"''' on generic drug 
targets 

Since FINDSITE^^""^ is much faster than traditional 
docking approaches and can use modeled as well as ex- 
perimental structures, we can perform large-scale testing 
on drug targets (some of which lack experimental struc- 
tures). This kind of test is not feasible for traditional 
docking methods. We tested FINDSITE"^^""^ on a set of 
3,576 DrugBank [57] targets that we can confidently 
model using TASSER^^^-lite [53]. We use modeled tar- 
get structures even for those targets that have experi- 
mental PDB structures. Drugs of all the 3,576 targets are 
buried in a background of representative compounds 
that are culled to TC < 0.7 to each other from the 
ZINCS library [74]. The total number of screened com- 
pounds for each target is 74,378 (6,507 drugs +67,871 
ZINC8 compounds). 

The test results are shown in Table 5. FINDSITE^^""^ 
achieves an average enrichment factor of 52 for the top 1% 
of {viz. ranked within the top 744) selected compounds; 
moreover, about 65% of the targets have an EFq.oi > 1 
(EF =1.0 is by random selection). Thus, on average about 
half of the true drugs of typical target will show up within 
top 1% of the screened compounds. FINDSITE^^""^ will be 
helpful in enriching true binders for 65% of the targets in a 
typical genome sequence. We note that FINDSITE^^"^^ is 
better than any of its individual components. The major 

Table 4 Comparison of methods for the 30 protein DUD 
set using experimental and modeled structures 



^Numbers in parentheses are two-sided p-values of Student-f test between 
FINDSITE^"'^'" and docking methods. 



Method 


Ave. EFo.oi 

(expt. 
structure) 


Ave. EFo.oi 
(modeled 
structure) 


Ave. EFo.i 

(expt. 
structure) 


Ave. EFo.i 
(modeled 
structure) 


FINDSITE"°"^^ 


14.1 


13.3 


4.54 


4.53 


AUTODOCK Vina 


5.45 


2.39 


2.48 


1.40 


DOCK 6 


3.82 


3.05 


1.29 


0.87 
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Table 5 Performance of FINDSITE methods for 3,576 drug 
targets 



Method (binding database) 


Average EFq.oi 


# (%) of targets 
having EFq.oi > 1 


FINDSITE (PDB) 


31.7 


1526 (43%) 


FINDSITE^ (DrugBank) 


36.6 


1714 (48%) 


FINDSITE^ (ChEMBL) 


9.5 


566 (16%) 


FINDSITE^"' (PDB) 


46.0 


2080 (58%) 


FINDSITE""""^ 


52.1 


2333 (65%) 



contribution to FINDSITE^^"'^ is from FINDSITE*^^' or 
holo PDB structure templates. 

Experimental validation of FINDSITE^^"''' 

For the experimental blind validation of this work, a 
compound library with molecules from the National 
Cancer Institute (NCI) and ZINCS [74] (TC < 07) as 
background was used. The open chemical repository 
maintained by the Developmental Therapeutics Program 
(DTP) at NCI/NIH is a comprehensive set of small mol- 
ecules consisting of compounds from the diversity set, 
mechanistic set, natural product set and approved oncol- 
ogy drug set. Compounds constituting the diversity set 
were derived from a parent library of ~ 140,000 com- 
pounds based on the following criteria: (1) Distinctness 
of the molecule, its pharmacophores and its conform- 
ational isomers, (2) Rigidity (5 or fewer rotatable bonds), 
(3) Planarity and (4) Pharmacologically desirable fea- 
tures. Compounds constituting the mechanistic set were 
selected from a seed library of 37,836 compounds tested 
on the NCI human tumor 60 cell line screens and repre- 
sent compounds that show a broad range of growth in- 
hibition. Compounds in the natural product set were 
selected from 140,000 compounds in the DTP open re- 
pository collection based on (a) origin, (b) purity, (c) 
structural diversity (differential scaffolds structures with 
varied functional groups), and (d) availability. The com- 
pounds in the approved oncology drug set consist of 
current FDA-approved drugs. 

The reason for using NCI molecules was that they are 
easy to obtain. The NCI molecules are downloaded from 
NCI (http://dtp.nci.nih.gov/branches/dscb/repo_open.html) 
and consist of 1597 molecules from the Diversity Set III, 97 
from the Approved Oncology Drugs Set IV, and 118 from 
the Natural Product Set II (total 1812 NCI molecules). The 
important fact is that no a priori target-compound binding 
information is used in both virtual screening and experi- 
mental validation. Together with the ZINC8 background, a 
total of 69683 molecules are screened by FINDSITE^^""^. 
NCI molecules ranked within the top 1% (i.e. higher 
than -700^^) for each target are subsequently consid- 
ered for thermal shift experimental validation. 



Acquisition and quantification of thermal shift assays 

High throughput thermal shift assays were carried out 
following established guidelines (Additional file 1: Table 
SI) [13,75]. Protein melting curves were obtained from 
samples aliquoted in 96-well plates using a RealPlex 
quantitative PCR instrument from Eppendorf (Eppendorf, 
NY, USA), with Sypro orange dye from Invitrogen as 
the fluorescent probe. A uniform final concentration of 
5 X (supplied as a 5000 X stock solution) was used in 
all experiments. The dye was excited at 465 nm and 
emission recorded at 580 nm using the instruments fil- 
ters. A heating ramp of l°C/min from 25°C to 74°C was 
used, and one data point acquired for each degree in- 
crement. For standardization, different buffers and pH 
were checked. Thereafter, 100 mM HEPES pH 7.3 and 
150 mM NaCl were used in all unfolding experiments. 
The volume of each reaction was 20 (il, and appropriate 
dye and protein controls were included. All experi- 
ments were done with a minimum of two replicates, 
with the mean value considered for further analysis. 
Several drugs/small molecules interact with Sypro or- 
ange and lead to aberrant signal enhancements. An 
additional control to rule out drug-dye interaction was 
carried out with all the constituents kept constant except 
for the protein of interest. The protein/protein-drug 
curves were reported after subtracting the respective 
dye alone/drug-dye curves. 

Each melting curve was assigned a quality score (Q), the 
ratio of the melting-associated increase in fluorescence 
(AFmeit) to the total fluorescence range (AFtotai)- Q = 1 is a 
high-quality curve, while Q = 0 indicates no thermal tran- 
sition [75]. Though an arbitrary Q value cutoff was not ap- 
plied to judge curve quality, the curves were manually 
curated with Q values reported. A substantial fraction of 
ligands tested against the various proteins displayed no 
thermal transitions, Q = 0, or showed multi-step unfolding 
behavior. These were ignored (see Table 1). 

Data analysis 

Subsequent to standardization, (see SI Methods), the val- 
idity of the top 1% of FINDSITE^^""^ s predictions on the 
test set of eight diverse proteins was examined. To be 
conservative, we focused only on those protein/ligand 
pairs showing single sigmoidal thermal transition curves. 
The fit to Boltzmanns equation (Eq. 1) was employed 
to estimate the melting temperature from the observed 
intensity, /. 

J _ J I Vmax~imin\ / . x 

J- — i-min ^ fT„_T\ 

1 + ey-^) 

lyyiin and lyyia^ the mlnlmum and maximum inten- 
sities; a denotes the slope of the curve at the transition 
midpoint temperate, [13]. To estimate thermodynamic 
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parameters, both van't Hoff [76] and Gibbs-Helmholtz 
analyses were done [77]. To estimate the approximate 
ligand-binding affinity at T^, Eq. (2) from reference [78] 
was used with slight modifications; ACp is ignored. 



(5) 



/<L(rm) is the ligand association constant and [L] is the 
free ligand concentration at ([Lxm] ~ [L] total, when 
[L] total > > the total concentration of protein. /<d is the 
inverse of KLiTm)- 

To eliminate the possibility of thermal shifts arising 
because organic molecules form colloidal aggregates 
[79], the complete NCI set was compared to the data- 
base of known aggregators maintained at http: //advisor. 
bkslab.org/search/. Since the thermal shift assay is in- 
compatible with the presence of detergents, (the method 
of choice to eliminate aggregation-based thermal shifts), 
we limited ourselves to estimate chemical similarity to 
known aggregators. At a stringent TC cutoff of 0.9, none 
of the molecules reported as possessing either binding 
or antimicrobial/cytotoxic activities are similar to known 
aggregators. 

Antimicrobial and cytotoxic assays on cancer cell lines 

Antimicrobial and anti-cancer tests were performed as 
in [80]. DHFR binders were tested on E. coll DH5a 
[positive control: Nitrofurantion (10 mg/ml in DMSO, 
negative control: DMSO], multi-drug resistant E. coli 
SMS-3-4 (ATCC BAA-1743) (MDREC) [positive control: 
Nitrofurantion (10 mg/ml in DMSO), negative control: 
DMSO], methicillin-resistant S, aureus (ATCC 33591) 
(MRS A) [positive control: Vancomycin (10 mg/ml in 
DMSO), negative control: DMSO], vancomycin-resistant 
£. faecium (ATCC700221) (VREF) [positive control: 
Chloramphenicol (10 mg/ml in DMSO), negative con- 
trol: DMSO], and colon carcinoma cells HCT-116 [posi- 
tive control: etoposide (20 (ig/ml in DMSO), negative 
control: DMSO]. Phosphatase (1000001 and 1000006) 
binders and tryptophanyl tRNA synthetase binders were 
tested on the colon carcinoma cell line HCT-116. 

Additional file 



Additional file 1: Detailed FINDSITE'°'"^ VLS results, Thermal shift 
assay standardization: methods and results, HTS protocol table, 
detailed results on the thermal shift assay and biological activity 
assay for the eight protein in tabular form, discussion on the 
differences between 1000001 and 1000006 VLS and experimental 
overlap and figure depicting the diversity of compounds picked up 
by the current methodology. 
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